Unsupervised Clustering of Morphologically Related Chinese Words
نویسندگان
چکیده
Many linguists consider morphological awareness a major factor that affects children’s reading development. A Chinese character embedded in different compound words may carry related but different meanings. For example, “商 店(store)”, “商品(commodity)”, “商代(Shang Dynasty)”, and “商朝(Shang Dynasty)” can form two clusters: {“商店”, “商 品”} and {“商代”, “商朝”}. In this paper, we aim at unsupervised clustering of a given family of morphologically related Chinese words. Successfully differentiating these words can contribute to both computer assisted Chinese learning and natural language understanding. In Experiment 1, we employed linguistic factors at the word, syntactic, semantic, and contextual levels in aggregated computational linguistics methods to handle the clustering task. In Experiment 2, we recruited adults and children to perform the clustering task. Experimental results indicate that our computational model achieved the same level of performance as children.
منابع مشابه
Semantical Clustering of Morphologically Related Chinese Words
A Chinese character embedded in different compound words may carry different meanings. In this paper, we aim at semantic clustering of a given family of morphologically related Chinese words. In Experiment 1, we employed linguistic features at the word, syntactic, semantic, and contextual levels in aggregated computational linguistics methods to handle the clustering task. In Experiment 2, we r...
متن کاملSemantic Clustering of Morphologically Related Chinese Words
A Chinese character embedded in different compound words may carry different meanings. In this paper, we aim at semantic clustering of a given family of morphologically related Chinese words. In Experiment 1, we employed linguistic features at the word, syntactic, semantic, and contextual levels in aggregated computational linguistics methods to handle the clustering task. In Experiment 2, we r...
متن کاملUnsupervised Sense Clustering of Related Chinese Words
Chinese words which share the same character may carry related but different meanings, e.g., “花錢(spend)”, “花 費(expend)”, “花園(garden)”, “開花(bloom))”. The semantics of these words form two clusters: {“花錢(spend)”, “花費(expend)”} and {“花園(garden)”, “開花(bloom)”}. In this paper, we aim at unsupervised clustering of a given set of such related Chinese words, where the quality of clustering results is t...
متن کاملAn Unsupervised Approach to Chinese Word Sense Disambiguation Based on Hownet
The research on word sense disambiguation (WSD) has great theoretical and practical significance in many fields of natural language processing (NLP). This paper presents an unsupervised approach to Chinese word sense disambiguation based on Hownet (an electronic Chinese lexical resource). In our approach, contexts that include ambiguous words are converted into vectors by means of a second-orde...
متن کاملStatistical Stemming for Kannada
Stemming is a process that groups morphologically related words into the same class and is widely used in information retrieval for improving recall rate. Here we study a set of statistical stemmers for Kannada, a resource-poor language with highly inflectional and agglutinative morphology. We compare stemming using simple truncation, clustering and an unsupervised morpheme segmentation algorit...
متن کامل